AITopics | second derivative

Collaborating Authors

second derivative

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust volatility updates for Hierarchical Gaussian Filtering

Mathys, Christoph, Legrand, Nicolas, Waade, Peter Thestrup, Mikus, Nace, Weber, Lilian Aline

arXiv.org Machine LearningMay-5-2026

Hierarchical Gaussian Filtering (HGF) networks allow for efficient updating of posterior distributions (beliefs) about hidden states of an agent's environment. HGF parent nodes can target the mean or variance of their children. New information entering at input nodes leads to a cascade of belief updates across the network according to one-step update equations for each node's mean and precision (inverse variance). However, the original form of the update equations for variance-targeting parents(volatility coupling) can in some regions of parameter space lead to negative posterior precision, a logical impossibility which causes the updating algorithm to terminate with an error. In this report, we introduce a modified quadratic approximation to the variational energy of volatility-coupled nodes that avoids negative posterior precision. The key idea is to interpolate between two quadratic expansions of the variational energy: one at the prior prediction and one at a second mode whose location is obtained in closed form via the Lambert W function. The resulting update equations are robust across the entire parameter space and faithfully track the variational posterior even for large prediction errors.

artificial intelligence, machine learning, variational energy, (17 more...)

arXiv.org Machine Learning

2605.00966

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neglected Hessian component explains mysteries in sharpness regularization

Neural Information Processing SystemsFeb-18-2026, 15:13:05 GMT

SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss.

artificial intelligence, information, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Austria (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spontaneous Symmetry Breaking in Generative Diffusion Models Gabriel Raya

Neural Information Processing SystemsFeb-17-2026, 06:06:10 GMT

Generative diffusion models have recently emerged as a leading approach for generating high-dimensional data.

artificial intelligence, machine learning, symmetry, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Another universality result for neural oscillators

Neural Information Processing SystemsFeb-15-2026, 22:14:25 GMT

The universal approximation Theorem 3.1 immediately implies another universal approximation Thus y (t) solves the ODE (2.6), with initial condition y (0) = y (0) = 0 . Reconstruction of a continuous signal from its sine transform. Step 0: (Equicontinuity) We recall the following fact from topology. F (τ):= null f (τ), for τ 0, f ( τ), for τ 0. Since F is odd, the Fourier transform of F is given by We provide the details below. The next step in the proof of the fundamental Lemma 3.5 needs the following preliminary result in By (B.3), this implies that It follows from Lemma 3.4 that for any input By the sine transform reconstruction Lemma B.1, there exists It follows from Lemma 3.6, that there exists Indeed, Lemma 3.7 shows that time-delays of any given input signal can be approximated with any Step 1: By the Fundamental Lemma 3.5, there exist It follows from Lemma 3.6, that there exists an oscillator Step 3: Finally, by Lemma 3.8, there exists an oscillator network,

artificial intelligence, lemma 3, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Piecewise Strong Convexity of Neural Networks

Tristan Milne

Neural Information Processing SystemsFeb-13-2026, 16:57:45 GMT

Neural Information Processing Systems http://nips.cc/

loss function, neural network, second derivative, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States (0.04)
(2 more...)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Variational Weighting for Kernel Density Ratios

Neural Information Processing SystemsFeb-7-2026, 22:44:34 GMT

Kernel density estimation (KDE) is integral to a range of generative and discriminative tasks in machine learning.

artificial intelligence, data mining, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > South Holland > The Hague (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Estimating Ising Models in Total Variation Distance

Daskalakis, Constantinos, Kandiros, Vardis, Yao, Rui

arXiv.org Machine LearningNov-27-2025

We consider the problem of estimating Ising models over $n$ variables in Total Variation (TV) distance, given $l$ independent samples from the model. While the statistical complexity of the problem is well-understood [DMR20], identifying computationally and statistically efficient algorithms has been challenging. In particular, remarkable progress has occurred in several settings, such as when the underlying graph is a tree [DP21, BGPV21], when the entries of the interaction matrix follow a Gaussian distribution [GM24, CK24], or when the bulk of its eigenvalues lie in a small interval [AJK+24, KLV24], but no unified framework for polynomial-time estimation in TV exists so far. Our main contribution is a unified analysis of the Maximum Pseudo-Likelihood Estimator (MPLE) for two general classes of Ising models. The first class includes models that have bounded operator norm and satisfy the Modified Log-Sobolev Inequality (MLSI), a functional inequality that was introduced to study the convergence of the associated Glauber dynamics to stationarity. In the second class of models, the interaction matrix has bounded infinity norm (or bounded width), which is the most common assumption in the literature for structure learning of Ising models. We show how our general results for these classes yield polynomial-time algorithms and optimal or near-optimal sample complexity guarantees in a variety of settings. Our proofs employ a variety of tools from tensorization inequalities to measure decompositions and concentration bounds.

ising model, matrix, probability, (15 more...)

arXiv.org Machine Learning

2511.21008

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

Quadratic Term Correction on Heaps' Law

Fontanelli, Oscar, Li, Wentian

arXiv.org Artificial IntelligenceNov-19-2025

Heaps' or Herdan's law characterizes the word-type vs. word-token relation by a power-law function, which is concave in linear-linear scale but a straight line in log-log scale. However, it has been observed that even in log-log scale, the type-token curve is still slightly concave, invalidating the power-law relation. At the next-order approximation, we have shown, by twenty English novels or writings (some are translated from another language to English), that quadratic functions in log-log scale fit the type-token data perfectly. Regression analyses of log(type)-log(token) data with both a linear and quadratic term consistently lead to a linear coefficient of slightly larger than 1, and a quadratic coefficient around -0.02. Using the ``random drawing colored ball from the bag with replacement" model, we have shown that the curvature of the log-log scale is identical to a ``pseudo-variance" which is negative. Although a pseudo-variance calculation may encounter numeric instability when the number of tokens is large, due to the large values of pseudo-weights, this formalism provides a rough estimation of the curvature when the number of tokens is small.

machine learning, natural language, regression, (19 more...)

arXiv.org Artificial Intelligence

2511.14683

Country:

North America > United States (0.46)
Europe > Netherlands (0.28)

Genre: Research Report (0.90)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Filters

Collaborating Authors

second derivative

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Robust volatility updates for Hierarchical Gaussian Filtering

0ff54b4ec4f70b3ae12c8621ca8a49f4-Paper-Conference.pdf

Neglected Hessian component explains mysteries in sharpness regularization

Spontaneous Symmetry Breaking in Generative Diffusion Models Gabriel Raya

A Another universality result for neural oscillators

Piecewise Strong Convexity of Neural Networks

45f7927942098d14e473fc5d000031e2-Supplemental-Conference.pdf

Variational Weighting for Kernel Density Ratios

Estimating Ising Models in Total Variation Distance

Quadratic Term Correction on Heaps' Law